Text segmentation and topic tracking on broadcast news via a hidden Markov model approach

نویسندگان

  • Paul van Mulbregt
  • Ira Carp
  • Larry Gillick
  • Steve Lowe
  • Jon Yamron
چکیده

Continuing progress in the automatic transcription of broadcast speech via speech recognition has raised the possibility of applying information retrieval techniques to the resulting (errorful) text. In this paper we describe a general methodology based on Hidden Markov Models and classical language modeling techniques for automatically inferring story boundaries (segmentation) and for retrieving stories relating to a specific topic (tracking). We will present in detail the features and performance of the Segmentation and Tracking systems submitted by Dragon Systems for the 1998 Topic Detection and Tracking evaluation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Topic Detection in Read Documents

In this paper, we address the importance and the problems involved in topic annotation in the speech retrieval domain. Identified the problem, an algorithm developed to perform automatic topic annotation of broadcast news (BN) speech corpora is described. The approach adopted is based in Hidden Markov Models (HMM) and topic language models, to solve topic segmentation and labelling tasks simult...

متن کامل

Segmentation and Indexation of Broadcast News

This paper describes a topic segmentation and indexation system for broadcast news that is integrated in an alert system for selective dissemination of multimedia information. The goal of this work is to enhance the retrieval and navigation through specific spoken audio segments that have been automatically transcribed, using speech recognition. Our segmentation algorithm is based on simple heu...

متن کامل

Combining Words and Speech Prosody for Automatic Topic Segmentation

We present a probabilistic model that uses both prosodic and lexical cues for the automatic segmentation of speech into topic units. The approach combines hidden Markov models, statistical language models, and prosody-based decision trees. Lexical information is obtained from a speech recognizer, and prosodic features are extracted automatically from speech waveforms. We evaluate our approach o...

متن کامل

Indexing Broadcast News

This paper describes a topic segmentation and indexation system for broadcast news that is integrated in an alert system for selective dissemination of multimedia information. The goal of this work is to enhance the retrieval and navigation through specific spoken audio segments (stories) that have been automatically transcribed, using speech recognition. Our segmentation algorithm is based on ...

متن کامل

A hidden Markov model approach to text segmentation and event tracking

Continuing progress in the automatic transcription of broadcast speech via speech recognition has raised the possibility of applying information retrieval techniques to the resulting (errorful) text. For these techniques to be easily applicable, it is highly desirable that the transcripts be segmented into stories. This paper introduces a general methodology based on HMMs and on classical langu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998